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Abstract 

For a happy and healthy life, a happy married life is very much essential. But nowadays, divorce 
cases are increasing rapidly day by day. According to a study, the divorce rate worldwide was 4.08% 
per 1000 married people in May 2022. For this reason, there is a need for effective prediction of 
divorce rate which helps the marriage counselor or therapist to understand how serious a case is. For 
this research purpose, a dataset was collected from UCI repository and contains some data based on 
the questions asked to the couple and the answers they gave. First, the dataset was cleaned using 
different ranking methods such as Information Gain, One R, Gain Ratio and ReliefF. Using these 
ranking methods, the most important fields that really affect the divorce are selected. Then, different 
classification algorithms such as Logistic Regression, Naive Bayes, SGD, Decision Tree, Random 
Forest and Multilayer Perceptron were used and compared to find the accuracy. These algorithms are 
used first with all fields and then with 6 and 7 fields. These algorithms are used for 50:50, 66:34, 
80:20 training/testing split and for 10-fold cross validation. When 7-fields are combined and checked 
with all algorithms for training-test and 10-fold cross validation, then 100% accuracy was found in 
Decision Tree, Random Forest and Multilayer Perceptron algorithms. 

Keywords: Divorce, Feature Selection, Decision Tree, Random Forest, Multilayer Perceptron, 10- 
fold cross validation. 


Introduction 

Marriage is legally and socially sanctioned union that is regulated by laws, rules, customs, beliefs, 
and attitudes which determines the rights and duties of the couples. Marriage brings happiness 
because it is a lifelong commitment between two people who love and care for each other. It is a 
partnership that can weather storms of life and provide stability and support during the good times 
and bad. Marriage brings happiness because it is built on trust, mutual respect, and communication. 
When these things are present, couples can work through anything that comes their way otherwise 
there are chances of Divorce. Divorce may have a significant emotional, financial, and social impact 
on individuals and families. It can lead to feeling of sadness, anger, and grief, as well as a sense of 
loss and trauma. Children of divorced parents can also experience emotional difficulties, such as 
anxiety and depression. Financially, divorce can lead to a decrease in income and an increase in 
expenses, as well as the potential for legal fees. Socially, divorce can lead to changes in relationships 
with friends and family members, and can also impact one's sense of identity and self-worth. It is 
important for individuals going through a divorce to seek support from friends, family, and 
professionals to help them cope with these challenges. For this research purpose a data set has been 
selected from UCI repository which contain many question asked by the therapist or marriage 
counselors to the couple and answer of those questions. This paper will give a proper view of those 
fields which really affect a relationship or a marriage. In this research paper we have used different 
ranking algorithms to rank the variable first according to their importance. Then different 
classification algorithms such as Logistic Regression, Naive Bayes, SGD, Decision Tree, Random 
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Forest and Multilayer Perceptron are used to obtain accuracy, kappa value and ROC area. These 
algorithms were used for 50:50, 66:34, 80:20 training/testing split and for 10-fold cross validation. 
When 7 fields are combined and checked with all algorithms for training-test and 10-fold cross 
validation, then 100% accuracy, a value of 1 for kappa and ROC area was obtained in Decision Tree, 
Random Forest and Multilayer Perceptron algorithms. 


Literature Study 


Author and Year Data source Method used Accuracy 
Machine 
learning, K- 
Nearest 
(A. Sharma, A. S. Chudhey & Neighbours, eda 
i UCIMLR Logistic (Perceptron 
M. Singh,2021) . ie 
Regression, classifier) 
Perceptron 
classifier, 
Decision Trees 
Random Forest, 
(M.S.Devi, D.Umanandhini, precision, 98% 
A.P.S. Anandaraj, UCIMLR accuracy, (Random 
S. Sridevi,2021) classification, Forest) 
cross validation 
Bio MSPHEO) 09.67% 
- optimization (Particle 
(P. Ranjitha & A. Prabhu,2020) | UCIMLR Algorithm, OWA 
Panicle aam Optimization) 
Optimization 
Data Mining; 
Machine 
Learning; Deep 
Learning; 100% 
(Ibrahim M. Nasser,2019) UCIMLR fee ou 
nalysis; Neural 
Artificial Neural | Network) 
Network; 
Divorce 
Prediction 
artificial neural | 922% 
(M.K.Yontem, K.Adem,T.[han UCIMLR newark (Artificial 
& S.Kilicarslan,2019) : e Neural 
divorce, divorce 
ate Network) 
prediction 


Methodology 


Different ranking methods and classification algorithms have used for training/testing set to get the 
maximum accuracy. The data set contain 55 fields in which one is dependent (‘class’) and others are 
Variables. First all the variables are combined and checked with Logistic Regression, Naive Bayes, 
SGD, Decision Tree, Random Forest, Multilayer Perceptron for 10-fold cross validation, 50:50, 
66:34, 80:20 training/testing sets, but 100% accuracy was not obtained. After that 6 fields are used 
obtained by different features selection methods like Information Gain, One R, Gain Ratio, Relief F 
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differently, and then 7 fields are used and achieved 100% accuracy in Decision Tree, Random Forest 
and Multilayer perceptron algorithms. 

The Dataset “Divorce Predictors data set” was gathered for development of this research paper from 
“https://archive.ics.edu/ml/datasets/Divoce+Predicatorst+datat+set”. 


Sl.no | Variable Variable Description 

1 If one of us apologizes when our discussion deteriorates, False(0), True(1) 
the discussion ends. 

2 I know we can ignore our differences, even if things get False(0), True(1) 
hard sometimes. 
When we need it, we can take our discussions with my 

2 spouse from the beginning and correct it. fee el 

4 When I discuss with my spouse, to contact him will False(0), True(1) 
eventually work. 

5 The time I spent with my wife is special for us. False(0), True(1) 

6 We don't have time at home as partners. False(0), True(1) 
We are like two strangers who share the same 

j environment at home rather than family. Pals) Te!) 

8 I enjoy our holiday with my wife. False(O), True(1) 

9 I enjoy travelling with my wife. False(O), True(1) 

10 Most of our goals are common to my spouse. False(O), True(1) 
I think that one day in the future ,;when I look back,I see 

11 that my spouse and I have been in harmony with each | False(0), True(1) 
other. 

12 My spouse and I have similar values in terms of personal False(0), True(1) 
freedom. 

13 My spouse and I have similar sense of entertainment. False(0), True(1) 

14 Most of our goals for people (children, friends, etc) are False(0), True(1) 
the same. 

15. Our dreams with my spouse are similar and harmonious. | False(0), True(1) 
We’re compatible with my spouse are similar about 

ro; what love should be. Enis); Treti 

17. We share the same views about being happy in our life False(0), True(1) 
with my spouse. 

18. My spouse and I have similar ideas about how marriage False(0), True(1) 
should be 

19. My spouse and I have similar ideas about how roles False(0), True(1) 
should be in marriage. 

20. My spouse and I have similar values in trust. False(0), True(1) 

21. I know exactly what my wife likes. False(O), True(1) 
I know how my spouse wants to be taken care of when 

22. seine cick: False(O), True(1) 

23. I know my spouse’s favorite food. False(0), True(1) 
I can tell you what kind of stress my spouse is facing in 

24. her/his life. False(O), True(1) 

25. I have knowledge of my spouse’s inner world. False(0), True(1) 

26. I know my spouse’s basic anxieties. False(0), True(1) 

27. I know what my spouse’s current sources of stress are. | False(0), True(1) 

28. I know my spouse’s hopes and wishes. False(0), True(1) 

29. I know my spouse very well. False(0), True(1) 
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30. I know my spouse’s friends and their social False(0), True(1) 
relationships. 
31. I fell aggressive when I argue with my spouse. False(O), True(1) 
30. When discussing with my sponse, I usually use False(0), True(1) 
expressions such as ‘you always’ or ‘you never’. 
33. I can use hegative Statements about my spouse’s False(0), True(1) 
personality during our discussion. 
34. I can use offensive expression during our discussion. False(0), True(1) 
35. I can insult my spouse during our discussions. False(0), True(1) 
36. I can be humiliating during our discussions. False(0), True(1) 
37. My discussions with my spouse in not calm. False(O), True(1) 
38. I hate my spouse’s way of open a subject. False(0), True(1) 
39. Our discussions often occur suddenly. False(0), True(1) 
40. We re just starting a discussion before I know what’s False(0), True(1) 
going on. 
When I talk to my spouse about something, my calm 
41. suddenly breaks. False(O), True(1) 
42. When I argue with my spouse’s way of open a subject. | False(O), True(1) 
43. I mostly stay silent to calm the environment a little bit. | False(0), True(1) 
44. a I think it’s good for me to leave home for a False(0), True(1) 
45. I’d rather stay silent than discuss with my spouse. False(O), True(1) 
46. Even if I’m right in the discussion, I stay silent to hurt False(0), True(1) 
my spouse. 
When I discuss with my spouse, I stay silent because I 
K am afraid of not being able to control my anger. Babe Amey) 
48. I feel right in our discussion. False(0), True(1) 
49. I have nothing to do with what I’ve been accused of. False(0), True(1) 
50. I’m not actually the one who’s guilty about what I’m False(0), True(1) 
accused of. 
51. I’m not the one who’s wrong about problems at home. | False(0), True(1) 
52. I wouldn’t hesitate to tell my spouse about her/his False(0), True(1) 
inadequacy. 
53. When I discuss, I remind my spouse of her/his False(0), True(1) 
inadequacy. 
54. I m not afraid to tell my spouse about her/his False(0y, Treti) 
incompetence. 


The different ranking methods and algorithms are described below. 

1) Information Gain: - The gain information value is generated from the entropy value that has not 
been separated and then reduced by the entropy value of the results after separation [1]. 

2) One R: - One R is a method for evaluating the worth of individual attributes (or features) in a 
dataset, with the goal of identifying which attributes are the most useful for a particular task. The 
method works by creating a simple decision tree called a "OneR" model, which uses only one attribute 
at a time to make predictions. The accuracy of the model is then calculated for each attribute, and the 
attribute with the highest accuracy is considered the most useful.[1] 

3) Gain Ratio: - Gain Ratio (GR) is a modification of IG (Information Gain). IG is to form the 
induction of the decision tree (ID3), while the Gain Ratio is used in C4.5 is a transformation of ID3.[1] 
4) Relief F: - Relief Feature Selection is a technique for sequence or attribute ranking based on 
instances that perform a random sampling process on an instance of data, then look for the closest 
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neighbors of identical and opposite classes. The value of the attribute in the nearest neighbor is then 
compared with the instance and then updates the relevant score for each attribute [1]. 

After this, we have used different algorithms for this research work. They are:- 

a) Logistic Regression: - Logistic Regression is a type of classification algorithm that is used to 
predict the output class for a given input. This is achieved by using a cost function that incorporates 
a sigmoid function, which helps to determine the likelihood of the input belonging to a particular 
class [5] 

The formula for logistic regression is represented by the logistic function or sigmoid function: 


b) Naive Bayes: - It is a probabilistic machine learning algorithm that makes classifications based on 
Bayes' theorem. It is called "naive" because it makes the assumption that all features are independent 
of each other, which is often not the case in real-world data. In the method being referred to, the 
features are assumed to be independent and are given equal weight in making predictions. The 
algorithm determines whether the features used are more likely to lead to one outcome versus another, 
and assigns a class based on this prediction [5]. 


The formula for Naive Bayes is based on Bayes' theorem, which states that: 


c) SGD: - SGD stands for Stochastic Gradient Descent, an optimization algorithm used to minimize 
an objective function by updating the parameters in the direction of the negative gradient of the loss 
function. Unlike batch gradient descent, which updates the parameters using the average gradient of 
the loss function with respect to the parameters over the entire training set, SGD updates the 
parameters after each training example. SGD is efficient for large datasets and is often used in training 
deep learning models, where the size of the dataset can make batch gradient descent computationally 
infeasible. [12] 
The formula for Stochastic Gradient Descent (SGD) is: 


d) Decision Tree: - A decision tree is a type of supervised machine learning algorithm that is 
commonly used in data analysis, classification, and prediction tasks [5]. A decision tree works by 
recursively partitioning a dataset into subsets based on the values of one or more input variables and 
also used heuristic values [5]. 


The decision tree algorithm uses a recursive approach to partition the data and create a tree-like 
structure. 

e) Random Forest: - Random Forest is an ensemble literacy system for bracket and retrogression 
tasks. It combines multiple decision trees to make a prediction. In a Random Forest, each tree in the 
ensemble is built using a random subset of the training data and a random subset of the features. The 
final prediction is made by combining the predictions of each tree in the forest through a voting 
process or by taking the average of the predictions [25]. 

The formula for a Random Forest prediction can be expressed as a combination of the predictions 
made by individual decision trees in the forest. 

f) Multilayer Perceptron (MLP):- A Multilayer Perceptron (MLP) is a type of artificial neural 
network that consists of multiple layers of interconnected nodes, or neurons [5]. It is also known as a 
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feed forward neural network, as the data flows through the network in a feed forward manner, from 
input layer to output layer, without looping back. The input layer receives the input data and passes 
it through the hidden layers, where the data is transformed and processed, before finally producing 
the output at the output layer. Each neuron in a hidden layer receives inputs from the neurons in the 
previous layer, performs a weighted sum, and then applies an activation function to produce an output 
[24]. 


Input 


Output 


Output 
Layer 


Input Hidden 
Layer Layer 


Results and Discussions 

With all the above algorithms, accuracy, ROC and kappa score are calculated. First, all the fields are 
taken from the dataset and combined with each other to calculate the accuracy, ROC, kappa score 
using all the above classification algorithms with 50-50, 66-34, 80-20 dataset and 10-fold cross 


validation. 
[Accuracy 


Training-Testing Data Logistic Naive SGD Decision [Random MLP 
Regression |Bayes Tree Forest 

50/50 83.52 91.76 85.88 89.41 85.88 84.70 

66/34 78.57 85.71 81.25 82.14 82.14 82.14 

80/20 73.52 80.14 77.94 77.94 81.61 73.52 

10 Fold cross Validation 79.41 87.64 80.58 89.41 80 85.88 

Kappa 

Training-Testing Data Logistic Naive SGD Decision Random MLP 
Regression |Bayes Tree Forest 

50/50 0.46 0.67 0.54 0.67 0.29 0.53 

66/34 0.24 0.46 0.32 0.53 0.07 0.43 

80/20 0.25 0.34 0.21 0.47 0.14 0.25 

10 Fold cross Validation 0.41 0.56 0.43 0.67 0.08 0.60 

ROC Area 

Training-Testing Data Logistic Naive SGD Decision [Random MLP 
Regression |Bayes (Tree Forest 

50/50 0.81 0.88 0.78 0.88 0.90 0.89 

66/34 0.75 0.79 0.64 0.91 0.73 0.78 

80/20 0.63 0.67 0.59 0.79 0.67 0.65 

10 Fold cross Validation 0.79 0.87 0.72 0.89 0.90 0.88 


After examining the result it is noticed 100% accuracy is not achieved by any of the algorithms. Then 
ranking method has been used to choose the most important fields from the dataset. Different ranking 
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methods are used then with the chosen fields the classification algorithm is tested. The first 6-fields 
are taken from each ranking method and tested with the above classification algorithms individually. 
After 7 fields are taken and tested individually. Then 100% accuracy was found in Decision Tree, 
Random Forest and Multilayer Perceptron algorithms. 


Information Gain:- 
When 6 — fields are combined together and tested. 
Fields are - [25, 22, 27, 5, 13, and 10] 


Accuracy 
Training- Logistic : Decision Random 
Testing Data Regression NeveRayes ocr Tree Forest Noms 
50/50 94.11 95.29 94.11 94.11 97.64 94.11 
66/34 94.64 94.64 94.64 94.64 95.53 94.64 
80/20 95.58 92.64 96.32 96.32 96.32 93.38 
a Ong 92.35 94.11 | 95.29 98.23 | 95.88 
Validation 
Kappa 
Training- Logistic : Decision Random 
Testing Data Regression Mave BAYES: BOD Tree Forest ME 
50/50 0.81 0.83 0.81 0.81 0.92 0.81 
66/34 0.81 0.80 0.81 0.81 0.84 0.81 
80/20 0.83 0.70 0.86 0.85 0.86 0.76 
10 Cross 
ET 0.67 0.65 0.76 0.81 0.93 0.84 
ROC Area 
Training- Logistic : Decision Random 
Testing Data Regression NeweBayes || SO) Tree Forest ee 
50/50 0.96 0.94 0.89 0.95 0.99 0.90 
66/34 0.91 0.95 0.84 0.84 0.97 0.92 
80/20 0.91 0.93 0.90 0.90 0.97 0.91 
10 Cross 
Validation 0.96 0.96 0.85 0.97 0.99 0.91 
When 7 — fields are combined together and tested. 
Fields are - [25, 22, 27, 5, 13, 10, and 33] 
Accuracy 
Training-Testing | Logistic Naive Decision Random 
Data Regression Bayes oer Tree Forest MLP 
50/50 90.58 89.41 95.29 | 100 100 95.29 
66/34 91.07 87.50 96.42 | 100 100 100 
80/20 86.02 84.55 66.17 | 91.17 94.11 100 
ea R 86.47 | 94.70 | 100 100 100 
Validation 
Kappa Score 
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Training- Logistic Naive Decision | Random 
Testing Data Regression | Bayes Pup Tree Forest MEF 
50/50 0.74 0.71 0.88 1 1 0.88 
66/34 0.74 0.61 0.90 1 1 1 
80/20 0.59 0.55 0.06 0.74 0.83 1 
10 Cross 
Validation 0.65 0.65 0.87 1 1 1 
ROC Area 
Training- Logistic Naive Decision | Random 
Testing Data Regression | Bayes aad Tree Forest MLE 
50/50 0.95 0.95 0.96 1 1 0.94 
66/34 0.96 0.96 0.97 1 1 1 
80/20 0.94 0.89 0.53 0.99 0.96 1 
10 Cross 
Validation 0.93 0.92 0.96 1 1 1 
One R:- 
When 6 — fields are combined together and tested. 
Fields are - 54, 20, 18, 17, 16, and 19 
Accuracy 
Training- Logistic ; Decision | Random 
Testing Data Regression Navebayes. AGD Tree Forest ial 
50/50 94.12 95.29 94.11 96.47 97.64 94.11 
66/34 94.64 94.64 94.64 96.42 95.53 94.64 
80/20 95.58 92.64 96.32 83.08 96.32 93.38 
poke Wer oie 92.35 94.11 [93.52 | 98.23 95.88 
validation 
Kappa Score 
Training- Logistic : Decision | Random 
Testing Data Regression Maive Bayes | -2GP Tree Forest sing 
50/50 0.81 0.83 0.81 0.88 0.92 0.81 
66/34 0.81 0.80 0.81 0.87 0.84 0.81 
80/20 0.83 0.70 0.86 0 0.86 0.76 
Ee DR a 0.65 0.76 | 0.79 0.93 0.84 
validation 
Roc Area 
Training-Testing | Logistic : Decision | Random 
Data Regression Maive Daes | PSP Tree Forest aed 
50/50 0.96 0.94 0.89 0.97 0.99 0.90 
66/34 0.91 0.95 0.89 0.91 0.97 0.92 
80/20 0.91 0.93 0.90 0.50 0.97 0.91 
oe Petia 0.96 0.85 | 0.99 0.99 0.91 
validation 


When 7 — fields are combined together and tested. 
Fields are - 54, 20, 18, 17, 16, 19 and 21 
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Accuracy 

Training- Logistic Naive Decision | Random 

Testing Data Regression Bayes PUD Tree Tree MEE 
50/50 90.58 89.41 95.29 97.64 100 95.29 
66/34 91.07 87.5 96.42 97.32 100 100 
80/20 86.02 84.55 66.17 74.26 94.11 100 
P ee gaa 86.47 94.70 | 98.23 100 100 
validation 

Kappa Score 

Training- Logistic Naive Decision | Random 

Testing Data Regression Bayes SUD Tree Forest ind 
50/50 0.74 0.71 0.88 0.93 1 0.88 
66/34 0.74 0.61 0.90 0.92 1 1 
80/20 0.59 0.55 0.06 0.23 0.83 1 

1O TOI EOS | 65 0.65 0.87 | 0.95 1 1 
validation 

Roc Area 

Training- Logistic Naive SGD Decision Random MLP 
Testing Data Regression Bayes Tree Forest 

50/50 0.95 0.95 0.96 0.99 1 0.94 
66/34 0.96 0.96 0.97 0.97 1 1 
80/20 0.94 0.89 0.53 0.54 0.96 1 

10 fold cross | 0.93 0.92 0.96 0.99 1 1 
validation 
Gain Ratio:- 
When 6 — fields are involved 
Field attributes are - 25, 22, 27, 5, 13, and 10 

Accuracy 

Training- Logistic . Decision Random 

Testing Data ooo cs WoE BANES AD Tree Forest b 
50/50 94.11 95.29 94.11 96.47 97.64 94.11 
66/34 94.64 94.64 94.64 96.42 95.53 94.64 
80/20 95.58 92.64 96.32 83.08 96.32 93.38 
e O, eS | Ging 92.35 94.11 | 93.52 98.23 95.88 
validation 

Kappa Score 

Training- Logistic : Decision Random 

Testing Data Randi Nane Bayes: nOD Tree Forest ME 
50/50 0.81 0.83 0.81 0.88 0.92 0.81 
66/34 0.81 0.80 0.81 0.87 0.84 0.81 
80/20 0.83 0.70 0.86 0 0.86 0.76 
He oes io gy 0.65 0.76 | 0.79 0.93 0.84 
validation 

Roc Area 

Training-Testing | Logistic Naive SGD Decision Random MLP 
Data Regression Bayes Tree Forest 

50/50 0.96 0.94 0.89 0.97 0.99 0.90 
66/34 0.91 0.95 0.89 0.91 0.97 0.92 
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80/20 0.91 0.93 0.90 0.50 0.97 0.91 
a a OG 0.96 0.85 0.99 0.99 0.91 
validation 

When 7 — fields are involved 

Field attributes are - 25, 22, 27, 5, 13, 10, and 33 

Accuracy 

Training- Logistic : Decision | Random 

Testing Data Regression BED AYES SSD Tree Forest cand 
50/50 90.58 89.41 95.29 97.64 100 95.29 
66/34 91.07 87.5 96.42 97.32 100 100 
80/20 86.02 84.55 66.17 14.26 94.11 100 
i o ea 86.47 94.70 [98.23 |100 100 
validation 

Kappa Score 

Training- Logistic ; Decision | Random 

Testing Data Regression Naver) AOR Tree Forest MEN 
50/50 0.74 0.71 0.88 0.93 1 0.88 
66/34 0.74 0.61 0.90 0.92 1 1 
80/20 0.59 0.55 0.06 0.23 0.83 1 
e R GG 0.65 0.87 | 0.95 1 1 
validation 

Roc Area 

Training-Testing | Logistic NaiveBayes | SGD Decision Random MLP 
Data Regression Tree Forest 

50/50 0.95 0.95 0.96 0.99 1 0.94 
66/34 0.96 0.96 0.97 0.97 1 1 
80/20 0.94 0.89 0.53 0.54 0.96 1 
Peat ee Oe 0.92 0.96 | 0.99 1 1 
validation 

Relief F:- 

When 6 — fields are involved 

Field attributes are - 25, 22, 27, 1, 16, and 33 

Accuracy 

Training-Testing | Logistic Naive Decision | Random 

Data Regression Bayes PHD Tree Forest MEE 
50/50 94.11 95.29 94.11 96.47 97.64 94.11 
66/34 94.64 94.64 94.64 96.42 95.53 94.64 
80/20 95.58 92.64 96.32 83.08 96.32 93.38 
a E n 92.35 9411 [9352 [98.23 | 95.88 
Validation 

Kappa 

Training-Testing | Logistic Naive Decision | Random 

Data Regression Bayes SGD Tree Forest ME 
50/50 0.81 0.83 0.81 0.88 0.92 0.81 
66/34 0.81 0.80 0.81 0.87 0.84 0.81 
80/20 0.81 0.70 0.86 0 0.86 0.76 
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ID Fon 2088 | 967 0.65 0.76 0.79 0.93 0.84 
Validation 

ROC Area 

Training-Testing | Logistic Naive Decision | Random 

Data Regression Bayes GP Tree Forest MER 
50/50 0.96 0.94 0.89 0.97 0.99 0.90 
66/34 0.91 0.95 0.89 0.91 0.97 0.92 
80/20 0.91 0.93 0.90 0.50 0.97 0.91 
DY Fod ctos igg 0.96 0.85 | 0.99 0.99 0.91 
Validation 


When 7 — fields are involved 
Field attributes are - 25, 22, 27, 1, 16, 33, and 37 


Accuracy 

Training- Logistic : Decision Random 

Testing Data Toe Wave Bayi aan Tree Forest MU 
50/50 90.58 89.41 95.29 97.64 100 95.29 
66/34 91.07 87.5 96.42 97.32 100 100 
80/20 86.02 84.55 66.17 74.26 94.11 100 
Bg PIN “TOSS hemes 86.47 94.70 | 98.23 100 100 
validation 

Kappa Score 

Training- Logistic i Decision Random 

Testing Data ee os Dawe Bayes pe Tree Forest MER 
50/50 0.74 0.71 0.88 0.93 1 0.88 
66/34 0.74 0.61 0.90 0.92 1 1 
80/20 0.59 0.55 0.06 0.23 0.83 1 
M TOIT eron ligy 0.65 0.87 | 0.95 1 1 
validation 

Roc Area 

Training-Testing | Logistic NaiveBayes | SGD Decision Random MLP 
Data Regression Tree Forest 

50/50 0.95 0.95 0.96 0.99 1 0.94 
66/34 0.96 0.96 0.97 0.97 1 1 
80/20 0.94 0.89 0.53 0.54 0.96 1 
Be tore, GIONS Giga 0.92 0.96 | 0.99 1 1 
validation 


From the above results it is examined that — 

e When all fields are combined together 100% accuracy is not achieved. 

e When 6 fields are combined taken from each ranking method and tested with classification 
algorithms then also 100% accuracy is not achieved from any classification algorithms. 

e Using Information Gain ranking method when 7 — fields are combined then 100% accuracy, 
kappa=1, and ROC=1 are achieved in decision tree and random forest algorithm for 50-50 data. For 
66-34 data and 10-fold cross validation 100% accuracy, kappa=1, and ROC=1 are achieved in 
decision tree, random forest, and multilayer perceptron algorithm. For 80-20 data only MLP gave 
100% accuracy, kappa=1, and ROC=1. 

e When One R, Gain Ratio, and Relief F ranking method is used and 7 — fields are combined then 
100% accuracy, kappa=1, and ROC=!1 are achieved in random forest only for 50-50 data. For 66-34 
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data and 10-fold cross validation 100% accuracy, kappa=1, and ROC=1 are achieved in random 
forest, and multilayer perceptron algorithm. For 80-20 data only MLP gave 100% accuracy, kappa=1, 
and ROC=1. 


Conclusion 

In conclusion, this paper aimed to analyze the accuracy of the divorce rate prediction dataset through 
50-50, 66-34, 80-20 and 10-fold cross validation. First different features selection methods are used 
to select the important fields. Then the most important 7 features are combined and tested with 
different classification algorithms. After testing results showed that the Random Forest classifier, 
MLP and Decision tree had an accuracy of 100%, kappa=1, ROC=1 for different cases. These results 
demonstrate the effectiveness of using the Random Forest, MLP, and Decision tree classifier in 
improving the accuracy of the divorce rate prediction dataset. This paper will help the marriage 
counselor or therapist to understand the most effective reason of the divorce cases. This paper will 
also help the couples to understand the different factors that really affect the relationships or 
marriages. Because a happy and healthy marriage keeps the couples happy and depression free. 


Future References 

e The same machine learning models can potentially be used to classify different data models from 
different fields. However, rigorous testing and validation are essential, and the approach should be 
justified by highlighting the novelty and unique challenges of the new domain. For example, a 
classification model that has been trained on medical data can potentially be used to classify data 
from a different field, such as financial data. However, it is important to keep in mind that the 
effectiveness of the model in a new domain will depend on several factors such as the similarity 
between the two domains, the availability and quality of data, and the suitability of the model 
architecture and parameters. 

e This analysis has been made on secondary dataset collected from UCI ML repository. We have 
conducted an analysis of this dataset using a particular model or algorithm to predict the likelihood 
of divorce based on various demographic and social factors. However, we state that their next target 
is to collect real-life data from their own society and test it against the model derived from the analysis 
of the secondary dataset. This is because real-life data may contain additional or different variables 
or factors that could affect the prediction accuracy of the model. By testing the model on real-life 
data, we hope to validate the effectiveness of their model and improve its predictive accuracy for 
practical use in the future. 
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